Mean-Field Theory for Batched-TD(λ)
نویسنده
چکیده
A representation-independent mean-field dynamics is presented for batchedTD(λ). The task is learning-to-predict the outcome of an indirectly observed absorbing markov process. In the case of linear representations, the discretetime deterministic iteration is an affine map whose fixed point can be expressed in closed form without the assumption of linearly independent observation vectors. Batched linear-TD(λ) is proved to converge w.p.1 for all λ. Theory and simulation agree on a random walk example.
منابع مشابه
On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning
We consider off-policy temporal-difference (TD) learning methods for policy evaluation in Markov decision processes with finite spaces and discounted reward criteria, and we present a collection of convergence results for several gradient-based TD algorithms with linear function approximation. The algorithms we analyze include: (i) two basic forms of two-time-scale gradient-based TD algorithms,...
متن کاملLinear-Response Time-Dependent Embedded Mean-Field Theory.
We present a time-dependent (TD) linear-response description of excited electronic states within the framework of embedded mean-field theory (EMFT). TD-EMFT allows for subsystems to be described at different mean-field levels of theory, enabling straightforward treatment of excited states and transition properties. We provide benchmark demonstrations of TD-EMFT for both local and nonlocal excit...
متن کاملAn Empirical Evaluation of True Online TD({\lambda})
The true online TD(λ) algorithm has recently been proposed (van Seijen and Sutton, 2014) as a universal replacement for the popular TD(λ) algorithm, in temporal-difference learning and reinforcement learning. True online TD(λ) has better theoretical properties than conventional TD(λ), and the expectation is that it also results in faster learning. In this paper, we put this hypothesis to the te...
متن کاملOn the MHD Boundary of Kelvin-Helmholtz Stability Diagram at Large Wavelengths
Working within the domain of inviscid incompressible MHD theory, we found that a tangential discontinuity (TD) separating two uniform regions of different density, velocity and magnetic field may be Kelvin-Helmholtz (KH) stable and yet a study of a transition between the same constant regions given by a continuous velocity profile shows the presence of the instability with significant growth ra...
متن کاملAdaptive Lambda Least-Squares Temporal Difference Learning
Temporal Difference learning or TD(λ) is a fundamental algorithm in the field of reinforcement learning. However, setting TD’s λ parameter, which controls the timescale of TD updates, is generally left up to the practitioner. We formalize the λ selection problem as a bias-variance trade-off where the solution is the value of λ that leads to the smallest Mean Squared Value Error (MSVE). To solve...
متن کامل